Attaching package: 㤼㸱lubridate㤼㸲
The following object is masked from 㤼㸱package:base㤼㸲:
date
Parsed with column specification:
cols(
.default = col_double(),
prosper_rating = [31mcol_character()[39m,
origination_date = [34mcol_datetime(format = "")[39m,
loan_status_description = [31mcol_character()[39m,
loan_default_reason = [33mcol_logical()[39m,
loan_default_reason_description = [33mcol_logical()[39m,
next_payment_due_date = [34mcol_datetime(format = "")[39m,
co_borrower_application = [33mcol_logical()[39m
)
See spec(...) for full column specifications.
266 parsing failures.
row col expected actual file
1957 loan_default_reason 1/0/T/F/TRUE/FALSE 2 'prosper2019.csv'
1957 loan_default_reason_description 1/0/T/F/TRUE/FALSE Bankruptcy 'prosper2019.csv'
3041 loan_default_reason 1/0/T/F/TRUE/FALSE 3 'prosper2019.csv'
3041 loan_default_reason_description 1/0/T/F/TRUE/FALSE Deceased 'prosper2019.csv'
3595 loan_default_reason 1/0/T/F/TRUE/FALSE 3 'prosper2019.csv'
.... ............................... .................. .......... .................
See problems(...) for more details.
pros_agg = pros_df%>%
mutate(yr = year(origination_date),mn = month(origination_date))%>%
#group_by(yr,mn, prosper_rating)%>%
group_by(origination_date,prosper_rating,yr,mn)%>%
summarize(avg_amt_borrowed = mean(amount_borrowed),
w_avg_rate = weighted.mean(borrower_rate,amount_borrowed))%>%
mutate(org_dt = ymd(paste(yr,mn,'01',sep = '-')))
pros_agg2 = pros_df%>%
mutate(yr = year(origination_date),mn = month(origination_date))%>%
group_by(yr,mn, prosper_rating)%>%
#group_by(origination_date,prosper_rating,yr,mn)%>%
summarize(avg_amt_borrowed = mean(amount_borrowed),
w_avg_rate = weighted.mean(borrower_rate,amount_borrowed))%>%
mutate(org_dt = ymd(paste(yr,mn,'01',sep = '-')))
A little about me before we begin:
Modeler at Bank of America
Avid Star Wars Fan

Amatuer user and ardent supporter of version control in data science

R user for the last 9 years.
Used R for modeling and development at American Credit Acceptance before coming to Bank of America.

- Introduced and championed Shiny for rapid application development
- Developed various models and process improvements using R tool kit
Bank of America work:

- Modeling Lead for Consumer Behavior Modeling:
- Worked on Risk Models for a year
- Currently working on Innovation team (Best job I think I will ever have):
- New modeling ideas
- New areas of the business that would benefit from modeling knowledge
- Application of deep learning
What is this presenation?
I enjoy the data visualization side of data science
ggplot was always go too for my data science needs
ggplot is an adoption of the precepts laid out in the Grammar of Graphics

ggplot doesn’t lend itself to interactivity
- several efforts have introduced interactivity
- ggvis
- rggobi
- iPlots
- htmlwidgets
- r2d3
htmlwidets and r2d3 are an adaptation of D3/javascript technology
D3 is a javascript package that is very important in data visualization
- Written in a language that integrates with web development
- Uses data to manipulate the web document through various objects (SVG)
- Released in 2011
Useful examples of power and interactivity:
Sunburst Example

D3 Pros and Cons
Pros - Very flexible and portable - Interactivity part of the dna - Looks very professional and polished
Cons - Very steep learning curve - API requires decent understanding of how javascript - Centered on webdevelopment instead of data science - ?Falling out of favor?
What is the answer?
Plot.ly is a solution with a simpler API and out of the box interactivity
Essentially plot.ly is API wrapper for several D3
How does the plotly package work
Key to understading package is understanding how it transforms the data
- Data enters in R formats
- Transformed to list format
- Tranformed to JSON format
Below is a useful diagram showing how the final presentation is done.

Plotly uses two key components:
- Data/Trace:
- Connection between data and visuals
- Traces have types (scatter plot, histograms, sunburst, etc.)
- Trace types have specific atttributes that can be defined.
- Layout
ggplotly() to the rescue
If you have plots in ggplot, you can start using plotly with just a simple function call on most ggplot objects.
Let’s make a ggplot from Propser data.
pros_scat = pros_df_sam%>%
ggplot(aes(principal_paid, interest_paid))+geom_point(aes(color = factor(prosper_rating)))
pros_scat

ggplotly(pros_scat)
NA
pros_df_sam%>%
plot_ly(x = ~principal_paid, y = ~interest_paid)%>%
add_markers(color = ~prosper_rating)
Do it again, but with lines:
amt_fin_p = pros_agg%>%
ggplot(aes(origination_date,avg_amt_borrowed))+
geom_line(aes(color = prosper_rating))
amt_fin_p

Simple ggplotly command adds the tooltip, zooming,
ggplotly(amt_fin_p)
Now the plotly syntax:
Same Graph different syntax (using the add_* trace addition)
pros_agg%>%
ungroup()%>%
plot_ly(x = ~origination_date, y = ~avg_amt_borrowed)%>%
add_lines(color = ~prosper_rating)
Histogram:
Histogram with muiltple factors
Now lets do this with 2 dimentions
subplt = subplot(
pros_df_sam%>%plot_ly(x = ~principal_paid, color = I("black"),type = 'histogram'),
plotly_empty(),
pros_df_sam%>%plot_ly(x = ~principal_paid,y = ~interest_paid,type = 'histogram2dcontour'),
pros_df_sam%>%plot_ly(y = ~interest_paid,color = I("black"),type = 'histogram'),
nrows = 2,
heights = c(0.2,0.8),
widths = c(0.8,0.2),
shareX = TRUE,
shareY = TRUE
)
No trace type specified and no positional attributes specifiedNo trace type specified:
Based on info supplied, a 'scatter' trace seems appropriate.
Read more about this trace type -> https://plot.ly/r/reference/#scatter
No scatter mode specifed:
Setting the mode to markers
Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode
p = layout(subplt, showlegend = FALSE)
p
NA
x <- rnorm(1000)
y <- rnorm(1000)
s <- subplot(
plot_ly(x = x, color = I("black"), type = 'histogram'),
plotly_empty(),
plot_ly(x = x, y = y, type = 'histogram2dcontour', showscale = F),
plot_ly(y = y, color = I("black"), type = 'histogram'),
nrows = 2, heights = c(0.2, 0.8), widths = c(0.8, 0.2),
shareX = TRUE, shareY = TRUE, titleX = FALSE, titleY = FALSE
)
No trace type specified and no positional attributes specifiedNo trace type specified:
Based on info supplied, a 'scatter' trace seems appropriate.
Read more about this trace type -> https://plot.ly/r/reference/#scatter
No scatter mode specifed:
Setting the mode to markers
Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode
p <- layout(s, showlegend = FALSE)
p
What does that look like in ggplot? Doable but not intuitive.
library(gridExtra)
hist_top <- pros_df_sam%>%ggplot(aes(principal_paid))+geom_histogram()
empty <- ggplot()+geom_point(aes(1,1), colour="white")+
theme(axis.ticks=element_blank(),
panel.background=element_blank(),
axis.text.x=element_blank(), axis.text.y=element_blank(),
axis.title.x=element_blank(), axis.title.y=element_blank())
scatter <- ggplot()+geom_density_2d(aes(pros_df_sam$principal_balance, pros_df_sam$interest_paid))
hist_right <- pros_df_sam%>%ggplot(aes(interest_paid))+geom_histogram()+coord_flip()
grid.arrange(hist_top, empty, scatter, hist_right, ncol=2, nrow=2, widths=c(4, 1), heights=c(1, 4))

Ack. Not as good.
Now some plots that are very difficult to do in ggplot and rely on the interactivity heavily.
Sunburst
---
title: "Plot.ly: How I learned a (very little) java script and love interactive plots:"
author: "David Grinder"
output: html_notebook
---

```{r, echo= FALSE}
#Import libraries

library(plotly)
library(listviewer)
library(tidyverse)
library(lubridate)
```

```{r, echo = FALSE}
#Importing data
pros_df = read_csv("prosper2019.csv")
pros_df_sam = pros_df[sample(nrow(pros_df),10000),]
```

```{r}

pros_agg = pros_df%>%
  mutate(yr = year(origination_date),mn = month(origination_date))%>%
  #group_by(yr,mn, prosper_rating)%>%
  group_by(origination_date,prosper_rating,yr,mn)%>%
  summarize(avg_amt_borrowed = mean(amount_borrowed),
            w_avg_rate = weighted.mean(borrower_rate,amount_borrowed))%>%
  mutate(org_dt = ymd(paste(yr,mn,'01',sep = '-')))
  
pros_agg2 = pros_df%>%
  mutate(yr = year(origination_date),mn = month(origination_date))%>%
  group_by(yr,mn, prosper_rating)%>%
  #group_by(origination_date,prosper_rating,yr,mn)%>%
  summarize(avg_amt_borrowed = mean(amount_borrowed),
            w_avg_rate = weighted.mean(borrower_rate,amount_borrowed))%>%
  mutate(org_dt = ymd(paste(yr,mn,'01',sep = '-')))


```



# A little about me before we begin:


Modeler at Bank of America

Avid Star Wars Fan

![](return_of_jedi.jpg)

Amatuer user and ardent supporter of version control in data science

![](git.png)

R user for the last 9 years.

Used R for modeling and development at American Credit Acceptance before coming to Bank of America.

![](ACA.png)

- Introduced and championed Shiny for rapid application development
- Developed various models and process improvements using R tool kit

**Bank of America work:**

![](https://www.underconsideration.com/brandnew/archives/bank_of_america_logo_animation_new_a.gif)

- Modeling Lead for Consumer Behavior Modeling:
- Worked on Risk Models for a year
- Currently working on Innovation team (Best job I think I will ever have):
  - New modeling ideas
  - New areas of the business that would benefit from modeling knowledge
  - Application of deep learning

# **What is this presenation?**

# **I enjoy the data visualization side of data science**

## **ggplot was always go too for my data science needs**

- simple intuitive syntax to write and read
- wide range of useful geoms to solve common modeling problems
- integrates well with the rest of tidyverse

- (Big step forward from lattice and base graphics)



### **ggplot is an adoption of the precepts laid out in the Grammar of Graphics** 

![](GG_concepts.png)

## ggplot doesn't lend itself to interactivity

- several efforts have introduced interactivity 
  - ggvis
  - rggobi
  - iPlots
  - htmlwidgets
  - r2d3

# htmlwidets and r2d3 are an adaptation of D3/javascript technology

## D3 is a javascript package that is very important in data visualization

- Written in a language that integrates with web development
- Uses data to manipulate the web document through various objects (SVG)
- Released in 2011

### Useful examples of power and interactivity:

# Sunburst Example

![](https://i.stack.imgur.com/H6O2K.gif)

# Density over time

![](https://media.giphy.com/media/NTjiuskIME6awKn1nD/giphy.gif)

[Decision Tree Example](http://bl.ocks.org/fractalytics/raw/495b63cf671b4c487bc40801366384e0/)


# **D3 Pros and Cons**

*Pros*
- Very flexible and portable
- Interactivity part of the dna
- Looks very professional and polished

*Cons*
- Very steep learning curve
  - API requires decent understanding of how javascript
- Centered on webdevelopment instead of data science
- ?Falling out of favor?

## What is the answer?

# **Plot.ly is a solution with a simpler API and out of the box interactivity**

Essentially plot.ly is API wrapper for several D3


# **How does the plotly package work**

Key to understading package is understanding how it transforms the data

- Data enters in R formats
- Transformed to list format 
- Tranformed to JSON format

Below is a useful diagram showing how the final presentation is done.

![](plotly_data_transform.svg)

**Plotly uses two key components:**

1. Data/Trace:
  - Connection between data and visuals
  - Traces have types (scatter plot, histograms, sunburst, etc.)
  - Trace types have specific atttributes that can be defined.
2. Layout


# **ggplotly() to the rescue**

If you have plots in ggplot, you can start using plotly with just a simple function call on most ggplot objects.


Let's make a ggplot from Propser data.  
```{r}
pros_scat = pros_df_sam%>%
  ggplot(aes(principal_paid, interest_paid))+geom_point(aes(color = factor(prosper_rating)))

pros_scat
```



```{r}
ggplotly(pros_scat)

```

```{r}


pros_df_sam%>%
  plot_ly(x = ~principal_paid, y = ~interest_paid)%>%
  add_markers(color = ~prosper_rating)
```



Do it again, but with lines:
```{r}
amt_fin_p = pros_agg%>%
  ggplot(aes(origination_date,avg_amt_borrowed))+
  geom_line(aes(color = prosper_rating))

amt_fin_p
```

Simple ggplotly command adds the tooltip, zooming, 

```{r}
ggplotly(amt_fin_p)
```

Now the plotly syntax:

```{r}
pros_agg%>%
  ungroup()%>%
  plot_ly(x = ~origination_date, y = ~avg_amt_borrowed, color = ~prosper_rating, type = 'scatter', mode = 'lines')
#  add_lines(color = ~prosper_rating)
```
Same Graph different syntax (using the add_* trace addition)
```{r}
pros_agg%>%
  ungroup()%>%
  plot_ly(x = ~origination_date, y = ~avg_amt_borrowed)%>%
  add_lines(color = ~prosper_rating)
```

Histogram:
```{r}
pros_df_sam%>%
  plot_ly(x = ~borrower_rate)%>%
  add_histogram()
```

Histogram with muiltple factors
```{r}

pros_df_sam%>%
  plot_ly(x = ~factor(term), color = ~prosper_rating)%>%
  add_histogram()

```

Now lets do this with 2 dimentions

```{r}

pros_df%>%
  plot_ly(
    x = ~principal_paid,
    y = ~interest_paid,
    type = 'histogram2dcontour'
  )
  
  
  
```

```{r}

subplt = subplot(
  
  pros_df_sam%>%plot_ly(x = ~principal_paid, color = I("black"),type = 'histogram'),
  plotly_empty(),
  pros_df_sam%>%plot_ly(x = ~principal_paid,y = ~interest_paid,type = 'histogram2dcontour'),
  pros_df_sam%>%plot_ly(y = ~interest_paid,color = I("black"),type = 'histogram'),
  nrows = 2,  
  heights = c(0.2,0.8), 
  widths = c(0.8,0.2),
  shareX = TRUE,
  shareY = TRUE
)

p = layout(subplt, showlegend = FALSE)
  
p
  
```




```{r}
x <- rnorm(1000)
y <- rnorm(1000)
s <- subplot(
  plot_ly(x = x, color = I("black"), type = 'histogram'), 
  plotly_empty(), 
  plot_ly(x = x, y = y, type = 'histogram2dcontour', showscale = F), 
  plot_ly(y = y, color = I("black"), type = 'histogram'),
  nrows = 2, heights = c(0.2, 0.8), widths = c(0.8, 0.2), 
  shareX = TRUE, shareY = TRUE, titleX = FALSE, titleY = FALSE
)

p <- layout(s, showlegend = FALSE)

p
```


What does that look like in ggplot?  Doable but not intuitive.
```{r}
library(gridExtra)

hist_top <- pros_df_sam%>%ggplot(aes(principal_paid))+geom_histogram()
empty <- ggplot()+geom_point(aes(1,1), colour="white")+
         theme(axis.ticks=element_blank(), 
               panel.background=element_blank(), 
               axis.text.x=element_blank(), axis.text.y=element_blank(),           
               axis.title.x=element_blank(), axis.title.y=element_blank())

scatter <- ggplot()+geom_density_2d(aes(pros_df_sam$principal_balance, pros_df_sam$interest_paid))

hist_right <- pros_df_sam%>%ggplot(aes(interest_paid))+geom_histogram()+coord_flip()

grid.arrange(hist_top, empty, scatter, hist_right, ncol=2, nrow=2, widths=c(4, 1), heights=c(1, 4))
```

Ack.  Not as good.  


Now some plots that are very difficult to do in ggplot and rely on the interactivity heavily.

Sunburst

```{r}
```

